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ABSTRACT 

In  this  paper,  we  propose  a  kernel  multi-metric  learning  al¬ 
gorithm  for  multi-channel  transient  acoustic  signal  classifica¬ 
tion.  The  proposed  method  learns  a  set  of  metrics  jointly  for 
multi-channel  transient  acoustic  signals  in  a  kernel-induced 
feature  space  to  exploit  the  non-linearity  of  the  data  for  im¬ 
proving  the  classification  performance.  An  effective  algo¬ 
rithm  is  developed  for  the  task  of  learning  multiple  metrics 
in  the  kernel  space.  By  learning  the  multiple  metrics  jointly 
within  a  single  unified  optimization  framework,  we  can  learn 
better  metrics  to  integrate  the  multiple  channels  of  the  signal 
for  a  joint  classification.  Experimental  results  compared  with 
classical  as  well  as  recent  algorithms  on  real-world  acoustic 
datasets  verified  the  effectiveness  of  the  proposed  method. 

Index  Terms —  metric  learning,  kernel  learning,  multi¬ 
channel  acoustic  signal  classification 

1.  INTRODUCTION 

Transient  acoustic  signal  classification  is  an  important  topic  in 
surveillance  and  security.  It  applications  range  from  daily  life 
to  battlefield  tasks  [1],  The  challenge  of  transient  acoustic  sig¬ 
nal  classification  lies  in  the  fact  that  the  typical  environment  is 
not  ideal,  but  is  usually  noisy  with  environmental  variations. 
To  handle  the  noise  and  extract  useful  features  for  classifi¬ 
cation,  variant  techniques  have  been  proposed  [1,  2,  3],  In 
[1],  a  maximum  likelihood  method  was  proposed  for  restor¬ 
ing  transient  signals  from  a  sensor  network  with  wavelet  sub¬ 
band  features  for  classification.  The  authors  of  [2]  proposed 
a  denoising  technique  based  on  short  time  spectral  attenua¬ 
tion  for  signals  from  a  microphone  array  for  target  detection 
and  localization.  In  [3],  a  wavelet  packet  transformation  was 
adopted  for  feature  extraction  followed  by  classification.  Al¬ 
most  all  the  previous  algorithms  on  acoustic  signal  classifica¬ 
tion  have  ignored  the  use  of  multiple  measurements  as  in  the 
case  of  multi-channel  signals  for  improving  the  classification 
performance. 
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In  our  previous  work  [4],  a  heterogeneous  multi-metric 
learning  (HMML)  method  for  multi-channel  transient  acous¬ 
tic  signal  classification  is  developed.  While  applied  onto  only 
multi-channel  acoustic  signals  in  [4],  the  algorithm  can  be 
potentially  applied  to  signals  collected  from  heterogeneous 
sources.  In  this  work,  we  extend  our  previous  work  and  pro¬ 
pose  a  multi-metric  learning  algorithm  in  the  Reproducing 
Kernel  Hilbert  Space  (RKHS),  which  can  exploit  the  non¬ 
linearity  of  the  data  in  the  feature  space  via  a  non-linear  map¬ 
ping  associated  with  a  kernel.  Experimental  results  verified 
the  effectiveness  of  the  proposed  method  over  several  con¬ 
ventional  and  comparable  methods. 

2.  HETEROGENEOUS  MULTI-METRIC  LEARNING 
MODEL  REVISITED 

The  aim  of  HMML  is  to  learn  a  projection  set  {Ps}f=1  (or  a 
metric  set  {Ms}f=1,  where  Ms  =  psTps)  adapted  to  each 
channel  for  improving  the  joint  classification  performance. 
Given  N  training  samples  from  S  potentially  heterogeneous 
channels  {({xf}f=1,  i/j)}  .  ^  the  following  model  is  used  in 
[4]  to  learn  the  metric  set. 

Minimize  £({Ps}f=1)  =  (1  -  A)-EpuU  +  XEpnsh,  (1) 

where  Ps  is  the  projection  matrix  for  the  s-th  channel,  and 

£Puii({Ps}f=i)  =  £  £||Ps(x?-x‘)||2, 

s=l 

S 

£push({Ps}f=i)  =  £  £(l-;y«)[l  +  £  ||Ps(x*-x£|2 

l  s=l 

-£iips(x?-xnn2]  , 

where  i  and  l  are  indexes  of  the  training  samples  and  j  ~^>  i 
denotes  the  set  of  “target”  neighbors  of  Xj,  i.e.,  the  k  nearest 
samples  with  the  same  label  as  x,:.  yu  £  {0, 1}  is  a  binary 
number  indicating  whether  x,;  and  x;  are  of  the  same  class. 
[•]_!_  =  max(-,0)  is  a  hinge  loss.  The  samples  contribut¬ 
ing  to  the  energy  Epus h(P)  are  termed  as  “impostors,”  which 
are  those  samples  within  the  radius  defined  by  target  samples 
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(plus  a  margin)  but  belong  to  classes  which  is  different  from 
the  target  class. 

To  solve  (1)  effectively,  a  gradient-based  optimization  al¬ 
gorithm  is  developed  in  [4].  After  the  metric  set  {Ms}f=1 
is  learned,  we  can  proceed  to  perform  classification  by  inte¬ 
grating  the  information  from  all  the  channels.  Given  a  multi¬ 
channel  test  sample  xt  =  {x*}f=1,  we  can  classify  it  us¬ 
ing  the  following  energy-based  classification  method  given  by 
(3),  which  can  be  used  for  better  classification  performance 
[5].  Denoting  the  distance  between  the  multi-channel  test 
sample  xt  and  a  multi-channel  training  sample  x^  =  {xf  }f=1 
as 


s 

DM(xt,Xi)  =  ^  dM=(xt,x8s),  (2) 

S=1 

then  the  energy-based  classification  can  be  achieved  via  [5]: 

yt  =  argmin(l  -  A)  DM(xt,Xj) 
yt  z — 

+  A  ^  (1-  yti )  [l  +  DM(xt,Xj)  -  DM(xt,x;)j  ^ 

+  A  ^  (1  -  yn)  [l  +  DM(xi,Xj)  -  DM(xi,xt)j  . 

The  first  term  in  (3)  represents  the  accumulated  energy  for 
the  k  target  neighbors  of  xt;  the  second  term  accumulates  the 
hinge  loss  over  all  the  imposters  for  xt;  the  third  term  rep¬ 
resents  the  accumulated  energy  for  different  labeled  samples 
whose  neighbor  perimeters  are  invaded  by  xt,  i.e .,  taking  xt 
as  their  imposter. 

3.  MULTI-METRIC  LEARNING  IN  THE 
REPRODUCING  KERNEL  HILBERT  SPACE 

In  this  section,  we  present  a  multi-metric  learning  method  in 
high-dimensional  feature  space  induced  by  kernel  mapping 
as  a  generalization  of  HMML  method  in  [4].  We  denote  this 
method  as  kernel-based  HMML  (KHMML)  in  the  sequel. 

3.1.  Multi-Metric  Learning  in  Kernel  Space 

By  introducing  a  non-linear  feature  mapping  function  cf>(-)  : 
Rm  — >  M™  with  n  3>  m,  and  denote  0f  =  0(xf),  we  can 
formulate  the  KHMML  model  as  follows: 

Minimize  £ ((Ps}f=i)  =  (1  -  A)£puii  +  A£push,  (4) 


By  differentiating  (4)  with  respect  to  P  5 ,  we  get  the  following 
expression: 


_  d£({Pst}S=i)  m 

Qt  gps 

=  (1  -  A)P?  Y  (0?  ~  0J)(0i  ~  00  T 

+  AP*  E  i(0<  -  00(0*  -  00 T  -  (0?  -  ~<t>i)T], 


where  Aft  is  defined  as  the  set  of  triple-indices  ( [i,j,l )  £  A ft 
if  and  only  if  ( i,j,l )  triggers  the  hinge  loss  in  ^push-  Note 
that  in  this  case,  as  the  dimensionality  of  RKHS  induced  by 
0(-)  may  be  infinite,  it  is  not  possible  to  update  the  projection 
set  {Ps}  directly.  Therefore,  to  learn  the  projection  set  {Ps} 
in  the  kernel  space,  we  adopt  a  parametric  representation  for 
it  as  a  linear  combination  of  the  feature  vectors  in  the  form 
of  Ps  =  0sq>sT,  where  0s  is  referred  to  as  the  combination 
coefficient  matrix  and  $ s  denotes  the  data  matrix  in  RKHS 
constructed  from  N  training  samples  for  channel  s  as  $  s  = 
[01  j  02;  '  '  '  ;  00  ]  •  Then  the  projection  of  a  sample  x*  with 
Ps  in  the  RKHS  can  be  computed  as: 


P30(xts)  =  ©$sT<^  =  e 


fc(x  i,x?) 
fc(xi,x?) 


=  0k? , 


.  k(x.sN,x.st)  _ 


(6) 


where  k(-,  •)  is  the  kernel  function  associated  with  the 
feature  mapping  function  </>(■),  specifically,  fc(xi,x2)  = 
0(xi )t0(x2 ) .  Substituting  Ps  =  0s3>sT  into  (5)  and  us¬ 
ing  (6),  we  get: 

Q?  =  (1  -  A)0?$sT  £^(0?  -  0J)(«  -  03)t  (7) 


+  A0?$ST  Y  [(01  -  0j)(0l  -  0|)T  -  (01  -  00(01  -  0?)T] 
=  (1  -  A)©?  Y  (k;  -  kj)(0?  -  0OT 
+  A0t  Y  [(k?  -  kO(0i  -  0OT  -  (k»  -  kO(0i  -  0f)T]  - 

(iJ.OoVt 

Note  that  for  the  term  (k|  —  k j)(0f  —  0®)T,  we  can  reformu¬ 
late  it  as  follows: 


where  Ps  is  the  projection  matrix  for  the  s-th  channel  in  the 
kernel  space,  and 

s 

(kj  -  k |)(0|  -  0J)T  =  (k i  ~  kO0iT  “  (k?  “  kO0f 
-  A(k,?-k^$sT  -  A(kl?_kjS,$sT 

A  3 

(8) 

£Pun({ps}f=i)  =  Y  Ellps(0‘-0OII2> 

s=l 

=  [A[kf-k^-Af-k^]^T, 

£puSh({pof=i)  =  E  E(1-2'«)[i+Ellps(0*-0i)ll2 

l  s=l 

where  a  |x^  is  a  matrix  constructed  by  using  x  as  its  i-th  col¬ 
umn  vector  and  zeros  elsewhere: 

S 

i—  1  columns 

-EllPs(0*s  ~0f)!l2]  • 

a= 1  + 

a[x)  =  [67^3, x,o,... , o] . 

(9) 
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Substituting  (8)  into  (7),  we  can  get: 


Qt  =  (1  -  A)0?  £  [A 
+  A0?  J2  [A 

(i,j,Z)eA/t 

=  qs$sT, 


(kf-k|)  _  A(k?-kl)]$ST 


(k?-k?)_A(kf-k?) 


.  Ckf -kf ) 


+  A 


(kf — kf )l 


where  =  (1  -  A)0f£.  ^  [A^  kJ  -  Ak?  k 
Ak'  k  Ak';  k;:  Ak:  k 
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kf  — kf 


L“*  “*  T  “i 

By  the  above  derivation,  we  have  represented  the  kernel  gradi¬ 
ent  direction  as  a  linear  function  of  the  kernel  matrix.  There¬ 
fore,  we  can  represent  the  updated  projection  P s  by  using  the 
updated  combination  coefficient  matrix  0 s  at  time  step  t  +  1 
as: 


Pts+i  <-  P"  -  aQt  =  Qst&  -  aHs$sT 

(1Q) 

=  (©t  -  afis)$sT  =  0?+1d>sT. 

Therefore,  by  (10),  we  have  learned  the  projection  matrix  P  s 
in  RKHS,  which  is  directly  intractable.  The  above  derivation 
is  inspired  by  [6],  which  is  designed  for  single  metric  learn¬ 
ing.  The  major  steps  of  the  proposed  KHMML  algorithm  is 
summarized  in  Algorithm  1 . 

3.2.  Multi-Channel  Signal  Classification  in  Kernel  Space 

After  we  learn  the  combination  coefficient  matrix  set 
{0s}f=1,  we  can  use  it  to  classify  the  test  samples.  To  do 
that,  we  first  show  how  to  calculate  the  distance  in  the  ker¬ 
nel  space  with  the  learned  kernel  metric  set.  The  distance 
between  the  7-th  and  j-th  samples  can  be  calculated  as: 

dM»(«Hxi),0(xi))  =  ||PV(x?)  -  Ps0(xj)||! 

=  ||©s(kf  -  k^)||I. 

By  substituting  this  into  (2)  and  (3),  we  can  perform  classifi¬ 
cation  for  the  test  sample  with  the  learned  metric  in  RKHS. 

4.  EXPERIMENT  RESULTS 

In  this  section,  we  carry  out  experiments  on  a  number  of  real 
acoustic  datasets  and  compare  the  results  with  several  con¬ 
ventional  classification  methods  to  verify  the  effectiveness 
of  the  proposed  method.  We  use  the  multi-channel  transient 
acoustic  dataset  collected  for  launch  and  impact  of  different 
weapons  (mortar  and  rocket)  using  a  tetrahedral  acoustic  sen¬ 
sor  array.  For  each  event,  the  acoustic  sensor  array  measures 
the  signal  from  a  launch/impact  event  using  four  acoustic  sen¬ 
sors  simultaneously.  We  have  a  total  of  four  datasets  (referred 
to  as  Dataset  1~4)  [7],  Among  these  four  datasets,  some  con¬ 
sist  of  four  subsets  collected  by  an  acoustic  sensor  array  de¬ 
ployed  at  four  different  physical  sites.  We  first  segment  the 
raw  signal  with  spectral  maximum  detection  [8]  in  order  to 
locate  the  physical  event  and  then  extract  the  first  50  Cepstral 


Algorithm  1:  Kernel  Multi -Metric  Learning. 

Input:  training  set  {({x|}f=1,  number  of 

nearest  neighbor  L,  kernel  function  k{-.  ■) 
Output:  combination  coefficient  matrix  set  {0s}f=1 
used  for  multi-metrics  in  kernel  space 
Initialize:  t  <—  0,  {0®}f=1,  J\Tt  =  {}  ; 
while  convergence  condition  false  do 

Update  the  active  set  A/j+i  by  collecting  the  triplets 
(7,  j,  l)  that  incur  the  hinge  loss  in  kernel  space; 
for  s  —  1, 2,  •  •  •  ,  S  do 

Compute  the  gradient  fls  for  the  s-th  channel; 
Take  gradient  step  for  the  combination 
coefficient  matrix  of  the  s-th  channel: 

_  0f+1  <-  ©?  -  afia; 

_  ti-t  +  1 


Table  1.  Classification  accuracy  for  the  two-class  mortar 


problem  (S  =  4,  L  =  3,  r  =  0.5). 


Dataset 

1 

2 

3 

4 

Average 

Logistic 

0.7778 

0.8069 

0.7183 

0.6857 

0.7472 

SVM 

0.8073 

0.7991 

0.7917 

0.7693 

0.7919 

CSVM 

0.8173 

0.8448 

0.7938 

0.8000 

0.8140 

HMML  [4] 

0.8673 

0.8621 

0.8525 

0.8240 

0.8515 

JSRC  [71 

0.8515 

0.8828 

0.8857 

0.8147 

0.8534 

KHMML 

0.8728 

0.8828 

0.8607 

0.8360 

0.8631 

coefficients  (start  from  the  second  coefficient)  [9]  for  classifi¬ 
cation. 

To  evaluate  the  effectiveness  of  the  proposed  method,  we 
compare  the  results  with  different  classical  algorithms,  in¬ 
cluding  sparse  linear  multinomial  Logistic  Regression  [10] 
and  Linear  Support  Vector  Machine  (SVM)  [11],  which  is 
used  in  two  modes  in  our  experiments:  (1)  treating  each  sen¬ 
sor  signal  separately  (SVM);  (2)  concatenating  all  the  signals 
from  different  sensors  (CSVM).  One-vs.-all  scheme  is  used 
for  SVM  in  the  case  of  multi-class  classification.  The  joint 
sparse  representation-based  classification  method  (JSRC)  [7] 
and  our  previously  proposed  HMML  method  [4]  are  also 
compared.  For  KHMML,  Gaussian  kernel  is  used  in  our  ex¬ 
periments,  with  bandwidth  a  =  0.8,  which  gives  desirable 
results  empirically.  The  number  of  nearest  neighbors  is  set  as 
L  =  3.  The  combination  weight  is  set  as  A  =  0.5. 

4.1.  Two-Class  Event  Classification 

In  this  experiment,  we  focus  on  the  classification  problem  be¬ 
tween  launch  and  impact  for  a  single  kind  of  weapon  (mortar) 
using  all  four  datasets.  We  randomly  split  each  dataset  into 
two  halves  (training  ratio  r  =  0.5)  for  training  and  testing 
and  run  the  experiment  five  times.  We  report  the  average  per¬ 
formance  in  Table  1  for  the  four  datasets.  It  can  be  seen  that 
KHMML  outperforms  HMML  for  all  the  datasets,  indicating 
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Table  2.  Classification  accuracy  for  the  four-class  problem 
(g  =  4,C  =  3,r  =  0.5). _ 


Dataset 

1 

2 

3 

4 

Average 

Logistic 

0.7440 

0.7234 

0.6882 

0.7367 

0.7231 

SVM 

0.7410 

0.7227 

0.6860 

0.7474 

0.7243 

CSVM 

0.7487 

0.7375 

0.6945 

0.7169 

0.7244 

KNN 

0.6204 

0.7188 

0.6236 

0.7456 

0.6771 

HMML  [4] 

0.8014 

0.7313 

0.7284 

0.7928 

0.7635 

JSRC  [7] 

0.8152 

0.7969 

0.7494 

0.7928 

0.7886 

KHMML 

0.8252 

0.7862 

0.7512 

0.8632 

0.8065 

the  effectiveness  of  the  multi-metric  learning  in  kernel  space 
for  exploiting  the  non-linearity  over  the  linear  property.  Fur¬ 
thermore,  the  proposed  KHMML  method  performs  overall 
better  than  the  joint  sparse  representation-based  method  [7], 
which  has  been  shown  to  be  effective  for  the  multi-channel 
transient  acoustic  signal  classification  task. 

4.2.  Four-Class  Event  Classification 

To  further  verify  the  effectiveness  of  the  proposed  method,  we 
test  it  on  a  four-class  classification  problem,  where  we  want 
to  decide  whether  the  event  is  launch  or  impact  and  whether 
the  weapon  is  mortar  or  rocket,  which  is  much  more  chal¬ 
lenging.  Similarly,  we  generate  training  and  testing  datasets 
by  randomly  splitting  each  dataset  into  two  halves.  We  repeat 
the  experiment  five  times  and  report  the  average  performance 
for  each  dataset  as  well  as  the  overall  average  classification 
accuracy  in  Table  2.  Again,  it  is  shown  that  the  KHMML 
method  improves  the  classification  accuracy  over  HMML  by 
a  large  margin  on  all  the  datasets,  and  performs  comparable 
to  or  better  than  JSRC  on  different  datasets. 

4.3.  Considering  the  Effects  of  Sensor  Sites 

In  this  experiment,  to  investigate  the  classification  perfor¬ 
mance  using  data  captured  by  sensors  at  different  physical 
sites,  we  generate  training  and  testing  datasets  according  to 
the  physical  sites  where  the  acoustic  sensor  array  is  deployed. 
Specifically,  the  Dataset  2  contains  subsets  collected  from 
four  different  sites.  We  keep  all  the  data  from  one  site  for 
testing  and  data  from  all  the  other  sites  for  training;  we  do 
this  for  each  dataset.  The  classification  results  are  summa¬ 
rized  in  Table  3.  Table  3  shows  that  the  proposed  KHMML 
method  is  more  robust  to  sensors’  site  locations  and  performs 
the  best  on  average,  indicating  the  effectiveness  brought  by 
joint  multi-metric  learning  in  the  kernel  space. 

5.  CONCLUSIONS 

We  have  presented  in  this  paper  an  effective  method  to  learn 
jointly  a  set  of  metrics  in  the  kernel  space  for  multi-channel 
transient  acoustic  signal  classification.  By  exploiting  the  non¬ 
linearity  in  the  data  via  kernel  mapping,  we  are  able  to  learn  a 


Table  3.  Classification  accuracy  for  the  four-class  classifica¬ 
tion  with  training  and  testing  on  data  measured  at  different 
physical  sites  (S  =  4,  L  =  3). 


Method 

Site  1 

Physical  Sites 
Site  2  Site  3 

Site  4 

Average 

Logistic 

0.6797 

0.6829 

0.7314 

0.7109 

0.6394 

SVM 

0.6901 

0.7134 

0.8005 

0.6652 

0.6449 

CSVM 

0.7292 

0.7073 

0.7766 

0.7043 

0.6574 

HMML  [4] 

0.7917 

0.7805 

0.7766 

0.7391 

0.7627 

JSRC  [7] 

0.8125 

0.8049 

0.7447 

0.7652 

0.7818 

KHMML 

0.8229 

0.8537 

0.8298 

0.7565 

0.8157 

set  of  metrics  adapted  for  each  channel  jointly  for  improving 
the  classification  performance.  Experiments  on  real-world 
multi-sensor  datasets  compared  with  several  conventional  as 
well  as  recent  developed  methods  verified  the  effectiveness 
of  the  proposed  method.  The  method  developed  in  this  paper 
is  not  limited  to  acoustic  signals  and  is  readily  applicable  to 
other  classification  tasks. 
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